{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "view-in-github",
    "colab_type": "text"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/bradleywhitlock/CDP-Exploration/blob/master/Part_4b_Data_Science_Starting_Notebook.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GBMiG1VEOkev",
    "colab_type": "text"
   },
   "source": [
    "# Part 4b: Data Science\n",
    "\n",
    "In the final part of the workshop we break out into two workstreams. This workstream continues down a more technical path, where we help you to perform advanced analytics and/or model development!\n",
    "\n",
    "Keep your notebook from part 3 close at hand, and use content from the developer workshop as reference material if you really want to leverage the ins and outs of CDP in your analysis.\n",
    "\n",
    "We can't wait to see what you come up with! Good luck :)\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WY4nwq2wOkew",
    "colab_type": "text"
   },
   "source": [
    "# Step 0: Environment setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "id": "r1LwRpNOOkex",
    "colab_type": "code",
    "colab": {}
   },
   "outputs": [],
   "source": [
    "# if you're working in google colab or similar\n",
    "!pip install -q cognite-sdk \"cognite-logger==0.3.*\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "id": "0UdHacfkOke2",
    "colab_type": "code",
    "colab": {}
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import os\n",
    "from datetime import datetime, timedelta\n",
    "from datetime import datetime\n",
    "from getpass import getpass\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.ensemble import RandomForestRegressor\n",
    "from sklearn.linear_model import LinearRegression\n",
    "\n",
    "from cognite import CogniteClient\n",
    "\n",
    "pd.set_option('display.max_rows', 10)\n",
    "\n",
    "client = CogniteClient(api_key=getpass(\"Open Industrial Data API-KEY: \"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5evhf4JqOke6",
    "colab_type": "text"
   },
   "source": [
    "# Step 1: Pull up the sub-assets and timeseries for your asset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "id": "7co9p-dxOke7",
    "colab_type": "code",
    "colab": {}
   },
   "outputs": [],
   "source": [
    "asset_id = 53231887945301"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "id": "XorpzjDOOke_",
    "colab_type": "code",
    "colab": {}
   },
   "outputs": [],
   "source": [
    "df_asset_children = client.assets.get_asset_subtree(\n",
    "    asset_id=asset_id,\n",
    "    depth=10\n",
    ").to_pandas().sort_values('depth')\n",
    "df_asset_children[['depth', 'id', 'parentId', 'description']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "id": "M3EB_QjgOkfC",
    "colab_type": "code",
    "colab": {}
   },
   "outputs": [],
   "source": [
    "df_asset_children_timeseries = client.time_series.get_time_series(path=str([asset_id])).to_pandas()\n",
    "df_asset_children_timeseries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Q846v1IUOkfF",
    "colab_type": "text"
   },
   "source": [
    "# Step 2: Explore\n",
    "\n",
    "Part 4b is an open ended investigation into Industrial data. Cogniters will be around to guide you along, and help you turn your industrial ideas into quality analysis. Here are a few projects to get you started:\n",
    "\n",
    "#### Data quality investigation\n",
    "Sometimes the edge goes down. We need to know how complete our datasets are. Play with `client.datapoints.get_datapoints_frame` to get years worth of day resolution data and come up with a good way of visualizing gaps in critical sensors. Can you spot times when the platform was shut down?\n",
    "\n",
    "#### Graph analytics\n",
    "Use the `networkx` python package (`networkx.from_pandas_dataframe`) to draw some cool visualizations of the asset hierarchy.\n",
    "\n",
    "#### Supervised sensor prediction\n",
    "Use scikit-learn regressors to build a model that uses other sensors to predict the value of one (important) sensor. Think about how this could be extended towards anomaly detection, and the potential pitfalls! A good understanding of the process will help!\n",
    "\n",
    "#### Unsupervised anomaly detection\n",
    "Build an unsupervised anomaly detection model (Isolation Forest, k-means, pca, to name a few). Conceive an anomaly detection score.\n",
    "\n",
    "\n",
    "## Tips\n",
    "- A concise, coherent analysis is more useful than a lengthy notebook that does many shiny things but doesn't reach any conclusions\n",
    "- Ask questions about the assets or read up on them ([ABB's handbook on Oil & Gas](https://library.e.abb.com/public/34d5b70e18f7d6c8c1257be500438ac3/Oil%20and%20gas%20production%20handbook%20ed3x0_web.pdf)).\n",
    "- Work together and share your results\n",
    "- Send us your notebook by opening a pull request to the [contributions folder](https://github.com/cognitedata/open-industrial-data/tree/master/contributions)! A well structured, clear analysis on your github profile and shared with the Open Industrial Data community could be your foot in the door to your next dream job in Data Science!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "id": "uGfai0Z7OkfG",
    "colab_type": "code",
    "colab": {}
   },
   "outputs": [],
   "source": [
    ""
   ]
  }
 ],
 "metadata": {
  "colab": {
   "name": "Part 4b - Data Science Starting Notebook.ipynb",
   "version": "0.3.2",
   "provenance": [],
   "include_colab_link": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
